fix(sessions): durably write-ahead the start prompt so it can't be lost#2666
fix(sessions): durably write-ahead the start prompt so it can't be lost#2666pauldambra wants to merge 4 commits into
Conversation
Starting a local task run can lose the user's prompt: it only exists in memory until the session has fully initialized and `session/prompt` has been delivered. If `agent.start` exceeds the 30s `SESSION_VALIDATION_TIMEOUT_MS` (common in large monorepos) or the app is reloaded/quit/crashes during the retry window, the prompt is gone and the run is stranded in `queued`. Persist the prompt as a durable write-ahead record BEFORE any session/agent work, and clear it only once it has actually been delivered: - New `PendingPromptStore` contract in core and an `electronStorage`-backed zustand store in ui (keyed by taskId, mirrors sessionConfigStore). - `createNewLocalSession` writes the prompt ahead before `createTaskRun` / `agent.start`, records the taskRunId once the run exists, recovers a saved prompt when a later connect arrives without one, and clears the record after `sendPrompt` resolves. A persisted record means "this prompt is owed delivery", which is also the hook for a future server-side reconciler to re-drive orphaned runs. Generated-By: PostHog Code Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/core/src/sessions/sessionService.ts:1085-1130
**Duplicate save object and drifting `createdAt`**
All nine fields in the first `pendingPrompts.save` call are repeated verbatim in the second — the only difference is `taskRunId`. Because both calls evaluate `recovered?.createdAt ?? Date.now()` independently, the second call's `createdAt` is captured *after* the `await client.createTaskRun(taskId)` round-trip, so the value that actually lands in persistent storage is later than "when the prompt was first written ahead". If this timestamp is ever used for staleness checks or a future reconciler, it will undercount age. Building the base record once before both saves fixes the duplication and pins the timestamp correctly.
### Issue 2 of 2
packages/ui/src/features/sessions/sessionServiceHost.test.ts:189-201
**Write-ahead behaviour is not asserted in the host integration tests**
The `pendingPromptStore` mock is wired in but its methods are never checked in any `connectToTask` test. The new critical-path behaviours — `save` called before `createTaskRun`, `save` called again with `taskRunId` after the run exists, `remove` called only after `sendPrompt` resolves, and the recovery path where `get` returns a saved record and re-drives the prompt — are all exercised by the production code under test but invisible to the suite. A future regression here would pass every existing test.
Reviews (1): Last reviewed commit: "fix(sessions): durably write-ahead the s..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e9cc0082f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
pauldambra
left a comment
There was a problem hiding this comment.
Note
🤖 Automated comment by QA Swarm — not written by a human
QA Swarm multi-perspective review (paul-reviewer, xp-reviewer, security-audit). qa-team was unavailable and skipped. See inline comments + the summary comment.
|
Note 🤖 Automated comment by QA Swarm — not written by a human Multi-perspective review: paul-reviewer, xp-reviewer, security-audit (qa-team unavailable, skipped) Verdict: 💬 APPROVE WITH NITSSolid, well-motivated change that slots cleanly into the existing Key findings
Convergence
Reviewer summaries
Automated by QA Swarm — not a human review |
…overy, soften wording
From qa-swarm review of this PR:
- Build the PendingPromptRecord once and re-save with `{ ...pending, taskRunId }`
instead of assembling the 9-field payload twice (convergent paul + xp).
- Log when a written-ahead prompt is actually recovered, so the safety net is
observable in the wild.
- Soften the "impossible to lose" wording in the doc + comment: persistence is
async/best-effort, so it's "very unlikely", not guaranteed.
Deferred to a follow-up: an integration test exercising the recovery branch of
createNewLocalSession (currently the store mock returns undefined everywhere).
Generated-By: PostHog Code
Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1
|
Note 🤖 Automated triage by PR Shepherd — not written by a human Actioned the qa-swarm findings in bc252db:
Deferred (author's call):
Tests stay green: core 162 (sessions), ui host+store suites 122. |
…elivering From Codex review of this PR: sendPrompt can resolve without actually delivering — "queued" (session busy) or "rate_limited" (usage-limit error swallowed by sendLocalPrompt) — but we cleared the write-ahead record unconditionally after it resolved, reintroducing a narrow loss window. Only clear the record on confirmed delivery. Generated-By: PostHog Code Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1
|
Note 🤖 Automated triage by PR Shepherd — not written by a human Triaged the Codex + Greptile findings (they converged with qa-swarm on the duplicate
Net: two real fixes applied, the deeper resume-path recovery consciously deferred to the reconciler follow-up. Re-pushed at ebc75e6. |
There was a problem hiding this comment.
Two unresolved bot inline comments raise concrete behavioral gaps that remain in the current diff: (1) on a rate-limit response, sendPrompt resolves without throwing but the prompt was never delivered — the unconditional pendingPrompts.remove() afterwards silently discards the write-ahead record the PR exists to preserve; (2) recovery logic only runs in createNewLocalSession, so if the app restarts after createTaskRun succeeds but before sendPrompt delivers (the task then has a latest_run.id), doConnect takes the reconnectToLocalSession path and the pending-prompt store is never consulted. Neither issue is a regression, but both undercut the stated safety guarantee and warrant human review before merge.
There was a problem hiding this comment.
The PR adds write-ahead durability for prompts, but the recovery logic is only wired into createNewLocalSession. If the app restarts after createTaskRun succeeds but before sendPrompt delivers, the task already has a run ID and doConnect takes the resume-existing route into reconnectToLocalSession — which never checks pendingPrompts. The pending record sits in storage unrecovered. Bot reviewers raised this as P1 and it visibly remains unaddressed in the current diff. This is the crash scenario the PR most needs to handle.
…th tests Resolves the remaining review findings on this PR: - Codex P1: a run stranded before its prompt was delivered routes to the resume-existing path on the next connect, not createNewLocalSession. Deliver the owed write-ahead prompt there too, once the resumed log is confirmed to carry no prompt yet. Extracted a shared deliverWrittenAheadPrompt() helper so the deliver-then-clear-on-confirmed-delivery logic lives once. - xp YAGNI: dropped the speculative list()/taskRunId surface (and the second save that only existed to populate taskRunId) — the reconciler that would consume them isn't in this PR. This also removes the duplicated save entirely. - Tests: cover the recovery branch end-to-end against the real store — delivery on the create-new route, delivery on the resume-existing route, and no delivery when the resumed run already has a prompt event. removePrompt's guard divergence from sessionConfigStore is intentional (the guard is the safer side), left as-is. Generated-By: PostHog Code Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1
|
Note 🤖 Automated triage by PR Shepherd — not written by a human Closed out the remaining review threads in 9ec4d33:
All review threads are now actioned or resolved-by-design. Local: core 162 + ui sessions 224 green; core + ui typecheck clean. |
There was a problem hiding this comment.
All P1/P2 bot concerns are addressed in the current diff: recovery on the reconnect route is present, deliverWrittenAheadPrompt correctly retains the write-ahead record on rate-limit and queued outcomes, and the key recovery paths are covered by new integration tests. Architecture follows the platform-interface-in-core / implementation-in-ui / wired-at-composition-seam pattern correctly.
Problem
Starting a local task run can lose the user's prompt. The prompt only exists in renderer memory until the session has fully initialized and
session/prompthas been delivered (a separate step afteragent.start). If session init exceeds the 30sSESSION_VALIDATION_TIMEOUT_MS— common in large monorepos, where loading skills/commands blows the budget — or the app is reloaded/quit/crashes during the auto-retry window, the prompt is gone and the run is left stranded inqueued.This is the root cause behind "I lost my prompt while starting a task" reports in the monorepo: session init repeatedly times out and the only copy of the prompt is in-memory state tied to a session that never came up.
Why: the prompt is the one thing in the start-a-task flow a user can't cheaply reproduce, yet nothing durable owned it before delivery.
Changes
Persist the prompt as a durable write-ahead record before any session/agent work, and clear it only once it has actually been delivered.
PendingPromptStorecontract in@posthog/coreand anelectronStorage-backed zustand store in@posthog/ui(keyed bytaskId, mirrors the existingsessionConfigStore).createNewLocalSessionnow: writes the prompt ahead beforecreateTaskRun/agent.start; records thetaskRunIdonce the run exists; recovers a saved prompt when a later connect arrives without one; clears the record aftersendPromptresolves.queuedruns.Out of scope (follow-ups): bumping/adapting the 30s init timeout, and the server-side reconciler + startup sweep that extends the guarantee to the case where the device never returns.
How did you test this?
pnpm --filter @posthog/core typecheckandpnpm --filter @posthog/ui typecheck— clean.biome checkon all touched files — clean.pendingPromptStore.test.ts).Automatic notifications
Created with PostHog Code from a Slack thread